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Abstract 

We study a generalization of the evolution model proposed by 
Guiol, Machado and Schinazi (2010). In our model, at each moment 
of time a random number of species is either born or removed from 
the system; the species to be removed are those with the lower fit- 
nesses, fitnesses being some numbers in [0,1]. We show that under 
some conditions, a set of species approaches (in some sense) a sample 
from a uniform distribution on [/, 1] for some / G [0, 1), and that the 
total number of species forms a recurrent process in most other cases. 

1 Bak— Sneppen and Guiol— Machado— Schinazi 
models 

Over the last years the modeling of biological evolution has received a lot 
of attention in literature. Many models were proposed to explain and un- 
derstand how nature works. A question that is a common reference to most 
research done over this field is why some species survive while others in the 
same ecosystem go extinct. 
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One of the models that were proposed for this purpose is the Bak-Sneppen 
model (BS) which was introduced by Per Bak and Kim Sneppen in 1993. 
The basic idea of their work was to build a model in which there exists a 
criterion that would represent the strength or resistance of each vertex in the 
ecosystem. This criterion is called fitness. The fitness of a vertex is usually 
related to its genetic code. The initial idea of their model was that the vertex 
with the weakest fitness is replaced by a new one. However, this leads to no 
interactions between the vertices and hence the model did not have receive 
much interest either from the biological or the mathematical point of view. 
To include the interaction factor in their model, they suggested that a "weak" 
vertex when leaving the system will also affect vertices that are connected 
with it and they will be removed from the system as well. 

In particular, the BS model consists of an "ecosystem" that contains 
a (fixed) number N of vertices which are located on the circumference of a 
circle. A quantity between and 1 is assigned to each vertex, and it represents 
its fitness. At each time step the vertex with the lowest fitness is replaced by 
another one with a random fitness in the interval [0,1]. At the same time its 
two neighbours are also replaced by two other vertices with random fitnesses 
in [0, 1]. This way no vertex can secure its survival no matter how "strong" 
its fitness is. 

In the 1993 paper Bak and Sneppen showed that their model had the 
property of self-organized criticality and punctuated equilibrium. In the later 
years some more interesting results were proved, for example Meester and 
Znamenski (2003, 2004) studied the limit behaviour of the fitnesses, including 
a discrete version of the model, and they showed that the mean fitness is less 
than 1 for the discrete case, which confirmed that the behaviour is indeed 
nontrivial. This was also supported by the simulations of the model which 
suggest that the limit distribution of the collection of fitnesses is uniform over 
the interval [/, 1] for some / that is believed to be close to 2/3. However, so 
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far one could not find a theoretical proof to confirm this behaviour. 

In 2010 Guiol, Machado and Schinazi considered another stochastic model 
of evolution (we will refer to it as the GMS model) as an alternative for the BS 
model, since they believed that the setup of the BS model was a bit artificial 
and did not represent the nature well. In the GMS model, the process starts 
wiht an empty subset of vertices of [0, 1]. At each step, with probability p 
a new vertex is born (birth case) and with probability q = 1 — p one vertex 
is removed (death case). Each vertex that enters the system is assigned a 
fitness value which is an independent random variable uniformly distributed 
on [0, 1]. In the death case the vertex with the lowest fitness is removed from 
the system. In Guiol et al (2010), it was proved that the set of vertices with 
fitness higher than a certain critical value f c = q/p will eventually approach 
a uniform distribution in the corresponding interval, with the error being of 
order less than n 1//2+e for any e > 0. Note that this mimics the behaviour 
which is expected to hold for the BS model. 

There are two basic differences between the two models. In the GMS 
model the number of vertices in the system is random (not fixed) as in the 
BS model, which seems to be a more realistic approach to an evolutionary 
model. The second difference is that in the GMS model only the weakest 
vertex is removed at each time, hence there is no interaction among the 
vertices of the ecosystem. This means that a "strong" vertex is more likely 
to survive in the GMS model than in the BS model. 

Recently, there were some finer results for the GMS model by Ben-Ari et 
al (2011), which included a log log n correction term. Guiol et al in (2011) 
also discovered a link between the survival time in an evolution model and 
the Bessel distributions. 

The Guiol et al (2010) paper motivated us to consider an extension of the 
GMS model, in which both the number of newborn and taken away vertices 
is random. Thus it makes the model even more realistic in expressing nature, 



3 



as well as providing us with some non-trivial mathematical challenges. In 
Section [2] we assume that the number of deaths is a bounded random variable 
and obtain the results similar to those in Guiol et al (2010); this assumption 
is removed in Section [3] where we study the most general case. 

2 Multiple random births and deaths at each 
step 

In our paper we will assume that at each step the numbers of vertices being 
born or taken away are random. Namely, suppose that X and Z are two 
positive integer-valued random variables, X n {Z n resp.) are i.i.d. random 
variables with the distribution of X [Z resp.) and X n 's and Z ra 's are all 
independent. Fix p e (0, 1) and set q = 1 — p. At time n, the state of the 
system is a finite subset T n of vertices in [0, 1]. By fitness of the vertex we 
understand its location on the segment [0, 1]. Note that this setup covers the 
GMS model if we set X = Z = 1. 

The system starts with an empty set, T = 0. At time n, with probability 
p we generate Z n new vertices, each having a fitness uniformly distributed 
over [0, 1] independently of each other and of anything else, so that |T n +i| = 
\T n \ + Z n ] otherwise with probability q = 1 — p we remove X n vertices with 
the smallest X n fitnesses, with the agreement that if there are less than X n 
vertices in the system, the system becomes empty again; as a result, |T n+1 | = 
max{|T n | — X n , 0} here. Under some assumptions on the distributions of X 
and Z we will derive the results for the long-term behaviour of the system. 

First, for some constant / G (0, 1) define L n , R n and R' n as follows: 

L n : set of vertices alive in the system at time n whose fitnesses lie in 
[0,/) 

R n : set of vertices alive in the system at time n whose fitnesses lie in 
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R' n : set of vertices that were born in the system from time to n and 
were assigned a fitness in [/, 1] . 

Obviously, R n O R' n . 

Definition 1. Suppose that A\ C A 2 C A3 . . . is an infinite sequence of 
sets, each consisting of a finite number of points in R. We say that A n 
approaches a random sample from distribution F if, with probability 1, there 
exists another sequence of sets B\ C B 2 C B 3 . . . such that (i) each of 
these sets is a finite collection of i.i.d. random variables with the common 
distribution F; (ii) \B n \ — )■ 00 as n — )■ 00; and (Hi) \A n AB n \ = o(|£> n |) as 
n^oo. Here AAB = (A\B) U (B\A). 

Let 

Pc = • (1) 

+ Hz 

Theorem 1. Assume that there is an (integer) constant M > such that 
X < M a.s., and E (Z 2 ) < 00. Let /i x = E(X) and /i z = E(Z). Also 
suppose that p e (p c , 1) and let 

f= q -^e(0,i). (2) 

P Hz 

Then, for every e > 0, there are n EN and C > such that 
< \R' n \ - \R n \ < C n^ +e for n > n . 



Moreover, T n approaches a random sample from U[f, 1]. 

Proof. The general skeleton of the proof is similar to that in [3], although 
our model requires a deeper analysis. 



5 



First, look at those times when \L n \ > M. In the "death" event, R n is 
unaffected and all vertices will be removed from the complementary set L n . 
Hence, for those times 

E (|L re+1 | - \L n \ \JR n )=E [W n ] = pf/iz -q/i x = (3) 

where T n is the sigma-algebra generated by the process by time n, and the 
distribution of random variable W n is given by 

{Binomial (Z n , f) with probability p, ^ 
—X n with probability q. 

On the other hand, it is at the times when \L n \ < M some vertices may be 
taken away from the set of R n , resulting in — M < |-R n +i| — \Rn\ < 0. 
Define t n to be 

t n = \{l<k<n: \L k \ < M}\ 

the number of those "bad" times. We will show that t n is of order smaller 
than n. Let 

k n = |{1 < k < n : \L k -i\ > M and \L k \ < M}\ 

For any fi > 

P (t n > 2fin^ <F(t n > 2fin^ +e ; k n < n^ + ^j + P (k n > = (I) + (//). 

(5) 

First, we want to choose an appropriate /i and hence to get an upper bound 
on (J). Set Ex — and for % — 1, 2, . . . recursively define 

Gi = min{k > E { : \L k \ > M}, 
E i+ i = min{&; > Gi : \L k \ < M}. 
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Then 

(oo 
\J[Ei,Gi 
1=1 

and max{« : Ei < n} = k n + 1. 

Let ['J denote the integer part of a number. Observe that Gj — £7$ are 
stochastically smaller than i.i.d. non-negative random variables with the 
distribution given by 

P(6>m) = (l-(p/) M ) Lm/MJ J m = 0,l,2,... 

since for |L^| to reach M, even starting from 0, it suffices to have M con- 
secutive birth events in which at least one of the new particles is located in 
[0, /]. Let ji = E£j < oo and 

Then 

/k n +l \ 

(J) = P (t n > 2/m3 +e ; k n < n^ +e ) < P I ^ [G* - > 2/in5 +e , fc n < n^ +e 

^ [G< - ^] > 2^ < P ( X) & > 2/in5 +e j < P f £ & > 2/im n 

At this point we will use a large deviation estimate which follows immediately 
from Lemma 9.4 in Chapter 1.9 of Durrett (1996): 

Lemma 1. Let Xi, X2, . . . , X„ be an i.i.d. sequence of random variables with 
fi := EX, and := E(e ,9Xi ) < 00 for some positive 1?. Let k^) = log </>(-$) 
and = Xi + X 2 + . . . + X„. Taen /or a > /*, 

P (S'n > no) < exp{-n {ped - «(#))}. 
Moreover, for i? small we have ad — > 0. 
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By applying this lemma to the sequence of £j (note that E (e < oo for 
sufficiently small d due to the fact that £j is a linear transformation of an 
exponential random variable), we obtain that there exists 9 > such that 



(J) < P ^ & > 2 / im « ^ ex P {~0n 1/2+e ) for all n. 




(6) 



Next, we want to get an upper bound on (II). We have 




) 




At the same time, for each % > 1, Ei + \ — Gi is stochastically smaller than 
a random variable Ti > 1, where Tj's are independent and each having the 
distribution of 



Here are i.i.d. random variables with the distribution given by (Hj). To 
estimate P(r < n) we use a result from the general theory of random walks 
given in Feller (1966) volume 2 (Theorem l.a in Chapter XII. 8 and Theorem 
1 in Chapter XVIII. 5 respectively). 

Theorem 2. Let X l7 X 2 , . . . , X n be an i.i.d, sequence of random variables 
with distribution F such that < .F(O) < 1. We define the sums Si,i G 
{0, 1, . . . , n} so that So = and S n = Xi + X 2 + . . . + X n and let 

Kn = \Q < k < n : Sk > So, Sk > Si, . . . , Sk > Sfc-i, > S^+i, ■ ■ ■ Sk > S n } 
= min{j : Sj = max Si}. 



t = min{j > 1 : W 1 + W 2 + ■ ■ ■ + Wj < 0}. 



iS[0,n] 



If the series 
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converges then for < k < n, 
W(K n = k) 



2k\ fin - 1k\ 1 
k { n-k / 22" 



where a n ~ b n means that lim n _ 5 . 00 a n /b n = 1. 

Theorem 3. Consider the notation of Theorem^ and suppose that its con- 
ditions hold. 7/EXi = and E (Xf ) = a 2 < oo, then the series Fty is at 
least conditionally convergent. 

Now we can apply Theorems [2] and [3] by setting Xj = —Wi since E W% = 
due to (j3J), and E H/j 2 < oo due to the fact that X is bounded and E (Z 2 ) < oo. 
Consequently 

P(r > n) = P(Xi < 0, Xi + X 2 < 0, . . . , Xi + • • • + X n < 0) 
= P(tf„ = 0)~( — - 



(JJ) < (P(r < n)) m " = (1 - P(r > n)) m " = 1 - 



n ) 2 2n (n\) 2 2 2n yjwn 

where we used Stirling's formula in the last equation. Combining the above 
calculations we have that 

-o(i)\ m - r / i + 

7=- = exp -■ 
Inn ) \ 

Therefore, there is a constant a > such that for all large n 

(II) < e~ an \ (8) 
Hence, plugging (JBJ) and (jHJ) into (JSJ) we obtain 

oo 

J^P (*n > 2/in? +e ) < oo 

and therefore by the Borel-Cantelli lemma a.s. there is an no such that t n < 
2un5 +<E for all n > n . 

Since \R' n+1 \ - \R' n \ = \R n +i\ - \R n \ + A, where A = if \L n \ > M and 
A G {0, 1, 2, ... , M} if \L n \ < M, we have 
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n—1 n—l 



\ R 'n\ ~ \ R n\ = ^ (\ R 'k+l\ ~ \ R 'k\) ~ ^2 d^+il ~ I^D 
k=0 k=0 
n-1 

= E W^+il - I^D " d^il - I^D] h\L k \<M} < Mt n < 2/iMn 1 / 2 ^ 



k=0 

for all n > n . 

Finally, to yield the final statement of Theorem [TJ observe that R' n is a 
collection of i.i.d. random variables from U[f, 1], and 

\T n AR' n \ = \L n \ + \R' n \ R n \. 

On one hand, we have \R' n \ R n \ < Cn 1 ^ 2+t for large n. On the other hand, 

\R' I \L I 

lim — 2- = — /) a.s. and limsup — — < EFT = a.s. 

n->oo n n^oo n 

(see ([3]) ) by the strong law. Therefore \R' n \ — > oo a.s. and 

|T n AKI r \L n \/n \K\R n \/n 
lim — : : — = lim — — h lim — : — = a.s. 

n^foo \R' n \ n^oo \R' n \/n n^oo \R' n \/n 

Thus T n approaches a random sample from U[f, 1]. □ 



3 Number of deaths unbounded 

In this section we will generalize the model to the case when X is not neces- 
sarily bounded. We will show that finiteness of E X is essentially a necessary 
and sufficient condition for T n to approach a random sample from a uniform 
distribution. 

First, we will prove a simple fact about the expectation of a non-negative 
integer random variables. 
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Lemma 2. Let X be a non-negative integer random variable. ThenKX < oo 
if and only if for every c > 



J^P(X > era) < oo. 



n=l 



Proof. Let p=EI = E^=i P ( X > ™) ^ [0,oo]. First, suppose that c < 1. 
Then there exists an integer m > such that 1/m < c. We have 

oo oo oo oo 

n=l n=l n=l fc=l 

(9) 

Now if c > 1, there exists and integer m > c. Then 

oo oo oo 

H = J^P(X > n) > -1 + J^P(X > en) > -1 + J^P(X > nm) 

n=l n=0 n=0 

oo 1 m— 1 oo _ 

>-i + V- Vr(i>™ + fc) =-i + -Vp(i>Jc) = - ^-i. 

fc=0 



m 

n=0 tk=0 



(10) 

Together, and ( 1101) yield the statement of the Lemma. □ 

Recall that T n is the set of species alive in the system at time n, so in 
case that we have a death event, 

|T n+ i| = max{0, \T n \ - X n } . 

Moreover, assuming p > p c , for every e G [0, 1 — /) where p c is given by ([T]) 
and / is the same as in ([2]) define 

L;:=X„n[0,/ + e) and i£ := T n n [/ + e, 1] 



Note that L° n = L n and i?° = R n . Also, define A e n as follows: 

A e n = {at time n we kill all vertices in L e n } = {L e n+1 = 0}. 
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Lemma 3. Suppose Hz = E Z < oo, fix = EX < oo and p > p c . Then, 
with probability 1, A e n occurs finitely often. 

Proof. First note that 

\L e n \ + Y n , where Y n ~ Binomial(Z n , f + e), with probability p, 
max{|L^| — X n , 0}, with probability q. 

Now, let Qo = and define Qi recursively as 



l-^n+ll 



Qn+l Qr 



Binomial (Z n , f + e), with probability p, 
—X n , with probability q. 



Thus Q n can be coupled with \L e n \ in such a way that \L € n \ > Q n for all n. 
On the other hand, Q n can be written as a sum of n i.i.d. random variables 
each with expectation 25 := pfiz(f + e) ~ = p/ize > 0. By the strong 
law of large numbers we have 

lim = 25 a.s. 

n— >oo n 

Hence 

liminf^y>25 a.s. 

n->oo n 

which yields that with probability 1 there exists a time N G N such that 
\L € n \ > n5 for all n > Nq. Next we calculate the probability that A e n occurs: 

P (A e n ) = q¥ (X n > \L e n \) <q¥(X> n5) for n > N . 

Consequently, by Lemma [21 since EX < oo, we have 

oo oo 

J2 r ( A n) < N + qJ2 F ( X >n5) <oo 

n=l n=l 

and so by the Borel-Cantelli lemma, we have that A e n occurs finitely often 
with probability 1. □ 
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Recall that ^ = EZ and Hx = EX. 
Theorem 4. The following is true. 

(a) Suppose Hz = oo and Hx < oo. Then T n approaches a random sample 
from U[0,1}. 

(b) Suppose Hz < oo. If fix < oo and p > p c then T n approaches a random 
sample from U[f, 1] where f is given by (TJ|). 

(c) Suppose Hz < oo. If fix < oo and p < p c , or fix = oo, then T n = for 
infinitely many n. 

Remark 1. Theorem^ leaves some gaps: it covers neither the critical case 
when fix, Hz < oo andp = p c , nor the general case where both Hx = Hz = oo. 

Proof, (a) Let B n be the set of all the particles born in the system by time 
n and let D n be the set of particles removed from the system by time n; 
therefore B n is a collection of i.i.d. U[0, 1] random variables and T n = B n \D n . 
Since at each time n we remove from the system at most X n particles, by the 
strong law of large numbers we have 

limsup - — — < qnx < oo a.s., and 

71— >oo Tl 
y \Bn\ 

hm = phz = oo a.s. 

n— >oo Tl 

Therefore, 

.. \T n AB n \ \D n \ \D n \/n 

limsup — — — j — = limsup = hmsup = (J a.s. 

n— >oo | tin I n— >oo | ti n | jj—^qo | lj n | J Tl 

which yields the desired conclusion. 

(b) Assume Hx < oo and p G (p c ,l). Recall that R' n denotes a set of 
vertices that were born in the system up to the time n that were assigned 
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a fitness in [/, 1]; thus R' n is a collection of i.i.d. U[f, 1] random variables. 
Moreover, as in the proof of Theorem [TJ \R' n \ — > oo. Fix an e > and observe 
that R e n C i? n C _R^. According to Lemma [31 there will be a time N\ such 
that events A € n do not occur for n > N\. This implies that no vertices are 
taken away from [/ + e, 1] for those n, and as a result 

sup|«\i£)n[/ + e,l]| <oo a.s. 

n 

On the other hand, by the strong law we have \R' n D [/, / + e, l)\/n — > ep fiz 
a.s., therefore 

I R' \ R n I , I R' \ Rt, I 

< limsup 1 n x 1 < limsup 1 n x < ep/i Z a.s. 

Since e > is arbitrary, we conclude 

|i4\-Rn| „ 

hm = a.s. 

n— >oo 77, 

From the end of the proof of Theorem [1] we have \R' n \/n — > pfiz(\ — f) a.s. 
and \L n \/n — > a.s. therefore 

hm sup — — — — = hm sup — — < hm sup — hhm sup 



which proves that T n approaches a random sample from U[f, 1]. 

(c) Now suppose fix < oo but p < p c . Due to the renewal nature of the 
process, it is sufficient to demonstrate that there exist a.s. at least one n > 1 
such that T n = 0. 

Let W n be i.i.d. random variables with the distribution given by 

Z n with probability p, 
—X n with probability q. 

Let t = inf{n > 1 : Wi + W 2 H h lf„ < 0}. Then r + 1 has the same 

distribution as inf{n > 1 : \T n \ =0}. Observe that by the strong law 

hm = JtL W = pfiz — qf^x — fix \ 1 < 0. 

n-»oo n \p c J 
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Therefore we must have r < oo a.s. 



Finally, assume that nx = oo. By the strong law we have 
limsup — — < lim - — — =pfiz a.s. 

n _>oo TI n^oo n 

therefore there exists c > and a positive integer N3 such that \T n \ < cn for 
all n> N 3 . On the other hand, by Lemma [21 

J^P(X n > cn) = > cn) = 00, 

n n 

and since the events {X n > cn} are independent, by the second Borel-Cantelli 
Lemma there will be infinitely many n for which X n > cn > \T n \ and hence 
T n+1 = 0. □ 
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